This notebook shows to do handwritten character recognition with logistic regression. I have adapted this example from an example of Aymeric Damien. He has a lot of nice notebooks discussing TensorFlow at
import gzip
import pickle
import random
import numpy as np
import matplotlib.pyplot as plt
The function $\texttt{vectorized_result}(d)$ converts the digit $d \in \{0,\cdots,9\}$ and returns a NumPy vector $\mathbf{x}$ of shape $(10, 1)$ such that $$ \mathbf{x}[i] = \left\{ \begin{array}{ll} 1 & \mbox{if $i = d$;} \\ 0 & \mbox{otherwise.} \end{array} \right. $$ This function is used to convert a digit $d$ into the expected output of a neural network that has an output unit for every digit.
def vectorized_result(d):
e = np.zeros((10, ), dtype=np.float32)
e[d] = 1.0
return e
The function $\texttt{load_data}()$ returns a pair of the form $$ (\texttt{training_data}, \texttt{test_data}) $$ where
def load_data():
with'mnist.pkl.gz', 'rb') as f:
train, validate, test = pickle.load(f, encoding="latin1")
X_train = np.array([np.reshape(x, (784, )) for x in train[0]])
X_test = np.array([np.reshape(x, (784, )) for x in test [0]])
Y_train = np.array([vectorized_result(y) for y in train[1]])
Y_test = np.array([vectorized_result(y) for y in test [1]])
return (X_train, X_test, Y_train, Y_test)
X_train, X_test, Y_train, Y_test = load_data()
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
The function $\texttt{show_digit}(\texttt{row}, \texttt{columns}, \texttt{offset})$ shows $\texttt{row} \cdot \texttt{columns}$ images of the training data. The first image shown is the image at index $\texttt{offset}$.
def show_digits(rows, columns, offset=0):
f, axarr = plt.subplots(rows, columns)
for r in range(rows):
for c in range(columns):
i = r * columns + c + offset
image = 1 - X_train[i,:]
image = np.reshape(image, (28, 28))
axarr[r, c].imshow(image, cmap="gray")
axarr[r, c].axis('off')
show_digits(5, 12)
import tensorflow as tf
In order to avoid a bug we have to set the following environment variable.
We create placeholders to use for the data. Below, None
stands for the yet unknown number of training examples.
X = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
Y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes
Next, we create variables for the weights and biases.
The variable W
is the weight matrix, while b
is the bias vector.
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
We construct the model for logistic regression. Y_pred
is the prediction vector. We use the
softmax activation function. For a $d$-dimensional vector $\mathbf{z}$, this function is defined as
$$ \sigma(\mathbf{z})_i := \frac{\exp(z_i)}{\;\displaystyle\sum\limits_{j=1}^d \exp(z_j)\;} $$
This function is predifined in TensorFlow.
Here, the vector $\mathbf{z}$ is defined as
$$ \mathbf{z} = \mathbf{x} \cdot W + \mathbf{b} $$
Y_pred = tf.nn.softmax(tf.matmul(X, W) + b)
We use the cross entropy as a cost function. This is defined as $$ -\sum\limits_{i=1}^d \mathtt{Y}_i \cdot \ln(\mathtt{Y\_pred}_i) $$ Here, $\mathtt{Y}_i$ is the expected outcome, while $\mathtt{Y\_pred}_i$ is the output predicted by our model.
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(Y_pred), reduction_indices=1))
We set some hyperparameters. We will use stochastic gradient descent with a minibatch size of $100$.
learning_rate = 0.05
training_epochs = 50
batch_size = 100
num_examples = X_train.shape[0]
We use stochastic gradient descent to minimize this cost function.
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
The function $\texttt{next_batch}(s)$ returns the next batch of the size $s$. It returns a pair
of the form $(X, Y)$ where $X$ is a matrix of shape $(s, 784)$ and $Y$ is a matrix of
shape $(s, 10)$. The function updates the global variable count
count = 0
def next_batch(size):
global count
X_batch = X_train[count:count+size,:]
Y_batch = Y_train[count:count+size,:]
count += size
return X_batch, Y_batch
init = tf.global_variables_initializer()
with tf.Session() as tfs:
for epoch in range(training_epochs):
count = 0
avg_cost = 0.0
num_batches = int(num_examples / batch_size)
# Loop over all batches
for i in range(num_batches):
X_batch, Y_batch = next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c =[optimizer, cost], {X: X_batch, Y: Y_batch})
# Compute average loss
avg_cost += c / num_batches
print("Epoch:", '%2d,' % epoch, "cost =", "{:.9f}".format(avg_cost))
print("Optimization Finished!")
# Test model
correct =, 1), tf.argmax(Y, 1)), {X: X_test, Y: Y_test})
print("Accuracy:", np.sum(correct) / len(correct))
